RNA Seq Analysis
Progeria vs wild type VSMC
1 Experimental design
A RNAseq experiment was performed on four groups of vascular smooth muscle cells (VSMCs) collected from wt and mutants mice (model of Progeria) at two different passages :
- WT at passage 8
- WT at passage 12
- Mutant at passage 8
- Mutant at passage 12
For each group, there was 3 replicates. For the wt, the last replicate (at each passages) was performed after the others, so there is a batch effect to take into account for the wt samples. For all groups, the mRNA and miRNA data were extracted and send to sequencing. The data were pre-processed, aligned into the genome and counted by the sequencing platform.
We will first analyse the gene expression profiles of the mutant VSMCs compared to the wt without taking into account the different passages. Then we will do a second analysis comparing the expression profiles of the mutant to the wt at the two passages.
We will start the analyse using the raw gene count matrix (tsv file).
2 Data loading
The following table is the gene count table, each row represent a gene and each column a sample.
#-------------------------- Set working directory --------------------------#
dir = "~/Documents/2.Omics_analysis/1.2021_Progeria/2.mRNA_Analysis/"
dir.create(dir)
#-------------------------- Import data --------------------------#
pathCount = "~/Documents/2.Omics_analysis/1.2021_Progeria/1.Data/mRNA/gene_count_matrix.tsv"
count = as.matrix(read.csv(pathCount,sep=",",row.names="gene_id")) # Gene count
pathfilter = "~/Documents/RNA Seq/2021 Progeria/count/filter_test.csv"
filter_count = as.matrix(read.csv(pathCount,sep=",",row.names="gene_id"))
order = c("X2021A40","X2021A41","X2021A92","X2021A45","X2021A46","X2021A93","X2021A42","X2021A43","X2021A44","X2021A47","X2021A48","X2021A49") # Order of the samples
count = count[,order]
ENS_ID = gsub("\\..*", "",rownames(count)) # Name of the genes (ENSEMBL notation)
SYMBOL_ID = gsub(".*\\|", "",rownames(count)) # Name of the genes (SYMBOL notation)
all_gene_names = as.data.frame(cbind(ENS_ID,SYMBOL_ID))
row.names(count) = ENS_ID
as.data.frame(head(count))#-------------------------- Set variables --------------------------#
alpha = 0.01
lfc_threshold = 1
log2FCDOWN = - 1
log2FCUP = 1
nameOutput = "2021Progeria"
n = 10The next table contains the information about the samples: the passage, the genotype (mutant/wt), and the batch. The column genotype_passage will be used for the analyses that consider the different passages.
#-------------------------- Sample information --------------------------#
coldata <- data.frame(Sample = colnames(count),
Genotype = c("WT", "WT", "WT", "WT","WT", "WT",
"Mutant", "Mutant", "Mutant","Mutant", "Mutant", "Mutant"),
Passage = c("P8","P8","P8","P12","P12","P12",
"P8","P8","P8","P12","P12","P12"),
Genotype_Passage = c("WTP8","WTP8","WTP8","WTP12","WTP12","WTP12",
"MutantP8","MutantP8","MutantP8","MutantP12","MutantP12","MutantP12"),
Batch = c("ONE","ONE","TWO","ONE","ONE","TWO","ONE","ONE","ONE","ONE",
"ONE","ONE"),
row.names = TRUE)
as.data.frame(coldata)3 Data processing
3.1 Creation of DGEList data
In order to compare the expression profiles of our samples, the first
step consists in creating an R object : DGEList. This
object contains matrix of counts and information about the samples. The
design of the analysis depends on the what variable(s) we want to use to
compare our data : here according to the genotype of the mice, while
removing the batch effect.
3.2 Filtering for lowly expressed gene
nrow(filter_count)## [1] 52550
f_keep = (rowSums(filter_count))>= 10
filter_count = filter_count[f_keep,]
nrow(filter_count)## [1] 28253
before = nrow(ddsWTvsMut)
keep = (rowSums(counts(ddsWTvsMut)))>= 10
ddsWTvsMut = ddsWTvsMut[keep,]
after = nrow(ddsWTvsMut)
SYM_ID = SYMBOL_ID[keep]The second step consists in filtering the data. In order to reduce the size of the object, and to increase the speed of our functions, we can remove the rows that have no or very low gene count. Here we apply the following filtering rule: removing rows that have very low count by keeping the gene for which the gene count sum across all sample is above 10. The initial number of genes is 52550 and 28253 after filtering for lowly expressed genes
4 Data exploration
In this section we will explore the data through different visualizations, to have a first look on the potential similarities between samples, and to highlight any batch effect (known or unknown). Before doing so we need to transform the data. The normalized count from DESeq2 are transformed using the variance stabilizing transformation (VST). The point of the VST transformation is to remove the dependence of the variance on the mean, particularly the high variance of the logarithm of count data when the mean is low. Its produces transformed data on the log2 scale which has been normalized with respect to library size.
Then we generate two visualizations, heatmap and PCA plot:
- The heatmap plots the sample-to-sample distances. Here, we apply the dist function to get sample-to-sample euclidean distances. A heatmap of this distance matrix gives us an overview over similarities and dissimilarities between samples.
- The PCA is summarizing the dataset variance. This type of plot is useful for visualizing the overall effect of experimental covariates and batch effects.
#-------------------------- With batch effect --------------------------#
#-------------------------- VST Transformation --------------------------#
ddsWTvsMut = DESeq(ddsWTvsMut) # WT vs Mutant
vsd = vst(ddsWTvsMut, blind = FALSE) # Transformation of the normalized counts in order to create at heatmap and PCA plot
#-------------------------- Distance Heatmap --------------------------#
data = t(assay(vsd)) # Extraction of the transformed counts
sampleDists = dist(data)
sampleDistMatrix = as.matrix(sampleDists)
rownames(sampleDistMatrix) = vsd$Genotype_Passage
colnames(sampleDistMatrix) = vsd$Genotype_Passage
colors = colorRampPalette(rev(brewer.pal(9, "Blues")))(255)
hm = pheatmap(sampleDistMatrix,
name = "Euc. dist",
clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists,
col = colors,
main = "With batch effect",
silent = TRUE)
#-------------------------- PCA Plot --------------------------#
data2 = as.data.frame(cbind(data, colData(vsd))) # Data table with the samples information the plot
pca_res = prcomp(data)
PCA = autoplot(pca_res, data = data2, col = "Genotype", shape = "Passage",label.size = 3)+
ggtitle("With batch effect")
#-------------------------- After removal of batch effect --------------------------#
data_wobatch = assay(vsd)
mm = model.matrix(~ Passage + Genotype, colData(vsd))
data_wobatch = limma::removeBatchEffect(data_wobatch, batch=vsd$Batch, design=mm)
assay(vsd) = data_wobatch
#-------------------------- Distance Heatmap --------------------------#
data = t(assay(vsd)) # Extraction of the transformed counts
sampleDists = dist(data)
sampleDistMatrix = as.matrix(sampleDists)
rownames(sampleDistMatrix) = vsd$Genotype_Passage
colnames(sampleDistMatrix) = vsd$Genotype_Passage
colors = colorRampPalette(rev(brewer.pal(9, "Blues")))(255)
hm_wobatch = pheatmap(sampleDistMatrix,
name = "Euc. dist",
clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists,
col = colors,
main = "Without batch effect",
silent = TRUE)
#-------------------------- PCA Plot --------------------------#
pca_res_adj = prcomp(t(assay(vsd)))
PCA_wobatch = autoplot(pca_res_adj, data = data2, col = "Genotype", shape = "Passage",label.size = 3) +
ggtitle("Without batch effect")
hms = ggarrange(hm[[4]],hm_wobatch[[4]])
PCAs = ggarrange(PCA,PCA_wobatch,
ncol = 2,
common.legend = TRUE,
legend = "bottom")
annotate_figure(PCAs, top = text_grob("PCA plots between samples", face = "bold", size = 14))annotate_figure(hms, top = text_grob("Distance heatmap between samples", face = "bold", size = 14))In the heatmap we can see that the removal of the batch effect reduced the distance between the wt samples, they are all grouped together, separated from the mutant samples. It seems that most of the variation comes from the different genotypes, and some from the passages. Indeed, in the wt samples, we can see that the ones at passages 8 are separated from the ones at passage 12. This separation is less defined in the mutant samples as we can see one mutant at passage 12 and one at passage 8 that are closer to each other than their respective passage group. Moreover, even if there is a clear separation by genotype, there is much more variation in between the mutant samples than in the wt.
The following heatmap is another representation of the sample, considering the genes that vary the most in terms of gene count between samples. Selecting those genes is sufficient to retrieve the same results obtained in the previous heatmap, and in the PCA plot.
topVarGenes = head(order(rowVars(assay(vsd)), decreasing = TRUE),n = 2000) # Without batch effect
hmap_data = (assay(vsd))[topVarGenes,]
hmap_data <- hmap_data - rowMeans(hmap_data)
ha = HeatmapAnnotation(Genotype = colData(vsd)[,1],
Passage = colData(vsd)[,2],
col = list(Genotype = c("WT" = "blueviolet","Mutant" = "red"),
Passage = c("P8" = "green","P12" = "blue")),
gp = gpar(col = "black"))
Heatmap(hmap_data, name = "Gene Count",show_row_names = FALSE,show_row_dend = FALSE ,top_annotation = ha, column_title = "Heatmap of the 2 000 genes with the highest variability")5 Differential expression analysis using DESeq2 : wt vs mutant VSMCs
As said previously, we first want to know what genes are
differential expressed in mutant VSMCs compared to WT VSMCs.
Differential expression analysis tests the hypothesis using a count
table (with raw counts, our gene count matrix). We look for genes for
which the count changes between two conditions more significantly than
expected by chance. Statistical thresholds are used to decide if the
difference between counts is significant. We made our analysis with
DESeq2, an R package. It is available in Bioconductor.
The DESeq2 package is designed for normalization,
visualization and differential analysis of high-dimensional count
data.
The function DESeq() performs three steps :
- Estimation of size factors : This function estimates the size factors using the median ratio method.
- Estimation of dispersion : This function obtains dispersion estimates for Negative Binomial distributed data.
- Negative Binomial GLM fitting and Wald statistics : This function tests for significance of coefficient in a Negative Binomial GLM, using previously calculated size factors and dispersion estimate.
It normalizes data and executes differential analysis. The results
are extracted from dds object using results
function. The alpha parameter is threshold of p-value,
equal to 0.01 for this analysis. Then, results are sorted by adjusted
p-value. The results table has 6 columns :
baseMean: the average of the normalized count values, divided by the size factors, taken over all samples.log2FoldChange: log2 fold change (l2fc) is the effect size estimate. It tells us how much the gene’s expression seems to have changed due to the genotype.lfcSE: the standard error estimate for the l2fc estimate.stat: Wald statisticpvalue: results of the test. Indicates the probability that a fold change as strong as the observed one (Wald test p-value)padj: Benjamini-Hochberg (BH) adjusted p-value (False Discovery Rate)
We created a function that extracts the results and generate plots to visualize them:
The MA-plot represents each gene with a dot. The X-axis is the average expression of normalized counts (A-values). The Y-axis is the l2fc between the two conditions (M-values). Genes expressed with no significant differences in between conditions will cluster around M = 0 value. Genes with an adjusted p-value below 0.01 are in red. The horizontal black lines show applied l2fc threshold to select differentially expressed genes (DEGs)
The Vocalno plot summarizes both fold-change and a mesure of statistical significance (i.e p-value). The Y_axis is the negative log10 pvalue and the X-axis is the l2fc. Each point represents gene. Genes with low p-value appearing towards the top of the plot. The l2fc is used to that changes in both directions (up and down) appear equidistant from the center. Genes in red have pvalue (ajdusted) < 0.01 and abs(l2fc) > 1.
x = list(c("Mutant","WT"))
WTvsMutRes = ddsRes (ddsWTvsMut,"Genotype",x, alpha, nameOutput, lfc_threshold, dir)names(WTvsMutRes) = sapply(list(c("Mutant","WT")), function(c){paste(c[1], "_vs_", c[2], sep = "")})The following table summarizes the number of DEG : total, down- and up- regulated ones. As said previously the genes are selected according to their adjusted pvalue (adjusted) < 0.01 and abs(l2fc) > 1.
WTvsMutRes_sig = sign(WTvsMutRes,log2FCUP,log2FCDOWN,alpha)
WTvsMutRes_sig = WTvsMutRes_sig[[1]]
nDEG = data.frame(DEG = nrow(WTvsMutRes_sig),
DOWN = nrow(WTvsMutRes_sig[WTvsMutRes_sig$log2FoldChange < log2FCDOWN,]),
UP = nrow(WTvsMutRes_sig[WTvsMutRes_sig$log2FoldChange > log2FCUP,]),
row.names = "Mutant vs WT")
nDEG5.1 Visualisation of the DEG
Here are the result of the differential analysis. This table contains
the baseMean, log2FoldChange,
lfcSE, stat, pvalue,
padj of each tested genes. You have the possibility to
apply different thresholds in the table, and search for the results of a
specific gene. Note that for the next parts I will consider the
significant DEG according to the filter we defined previously : padj
< 0.01 and abs(l2fc) > 1.
DEG_table(WTvsMutRes$Mutant_vs_WT,"Mutant vs WT", SYMBOL_ID, ENS_ID)Here is the plot of the normalized expression count (done by DESeq2) of some genes of interest. Note that Parp1 and Lmna are not considered as DEGs according to our pvalue and fold change filters
gene = c("Parp1", "Nt5e","Bst1","Lmna")
vsd_plot = cbind(t(2^(assay(vsd))),coldata)
plot_count(ddsWTvsMut,vsd_plot,gene,SYM_ID,all_gene_names)5.2 Gene enrichement analysis of the DEGs
In this section we’ll analyse the functions of the differentially expressed genes. To do so we’ll use over-representation (or enrichment) analysis. It is a statistical method that determines whether genes from pre-defined sets (ex: those belonging to a specific GO term or KEGG pathway) are present more than would be expected (over-represented) in a subset of the data. Here the considered subset it the DEGs between mutant and wt samples. The fowllowing tables show the 15 most significant enriched terms.
dirGO = dir.create(file.path(dir, "GO Enrichment"))
dirGO = paste(dir,"GO Enrichment/", sep = "")
sigGenes = rownames(WTvsMutRes_sig)
enrich = gost(sigGenes, organism = "mmusculus",sources = "GO:BP",correction_method = "fdr")
write.table(apply(enrich$result,2,as.character),
file = paste(dirGO, "WTvsMut", sep = ""),
quote = FALSE, sep = ";")
term_id = enrich$result$term_id[order(enrich$result$p_value)]
sigGenesUp = rownames(WTvsMutRes_sig[WTvsMutRes_sig$log2FoldChange > lfc_threshold,])
enrich = gost(sigGenesUp, organism = "mmusculus",sources = "GO:BP",correction_method = "fdr")
write.table(apply(enrich$result,2,as.character),
file = paste(dirGO, "WTvsMut_UP", sep = ""),
quote = FALSE, sep = ";")
sigGenesDown = rownames(WTvsMutRes_sig[WTvsMutRes_sig$log2FoldChange < -lfc_threshold,])
enrich = gost(sigGenesDown, organism = "mmusculus",sources = "GO",correction_method = "fdr")
write.table(apply(enrich$result,2,as.character),
file = paste(dirGO, "WTvsMut_DOWN", sep = ""),
quote = FALSE, sep = ";")
write.table(term_id, file = paste(dirGO,"MutvsWT_GO_id.txt", sep = ""), sep = "\t",
row.names = FALSE, col.names = FALSE)In terms of biological processes the terms associated with the DEGs are mostly related to developmental processes (represented by up regulation DEGs) and response to viral infection (represented by down regulation DEGs).
6 Differential expression analysis using DESeq2 : wt vs mutant VSMCs at different passages
Now we’ll compare the gene expression profiles of the mutant cells against the wt cells considering the different passages. The passage can be assimilated to the age of the cells, so the analysis will allows us to know what genes are differential expressed in mutant VSMCs compared to WT VSMCs and if the difference in “age” induces any additional changes.
The process is exactly the same, but when doing the differential analysis with DESeq2 we include the genotype and the passage.
ddsAllCond = DESeqDataSetFromMatrix(countData = count,
colData = coldata,
design = ~ Batch + Genotype_Passage)
ddsAllCond = ddsAllCond[keep,]
SYM_ID = SYMBOL_ID[keep]We make four comparisons :
- The mutant cells at P8 vs the mutant cells at P12
- The mutant cells at P8 vs the wt cells at P8
- The wt cells at P8 vs the wt cells at P12
- The mutant cells at P12 vs the wt cells at P12
The comparison between the same stains (wt vs wt or mutant vs mutant) allow us the extract the gene that are differentially expressed due the difference in passages. The comparison between strains at the same passages allow us to identify the DEG due to the mutation only if they are differentially expressed in both condition or due to the combination of the mutation and passages if they are differentially expressed in only one analysis.
The following plots represent the results of the different analysis.
ddsAllCond = DESeq(ddsAllCond) # WT vs Mutant
contrasts = list(c("MutantP12","MutantP8"),c("MutantP8","WTP8"),
c("WTP12","WTP8"),c("MutantP12","WTP12"))
AllCondRes = ddsRes (ddsAllCond,"Genotype_Passage",contrasts, alpha, nameOutput, lfc_threshold,dir)names(AllCondRes) = sapply(contrasts, function(c){paste(c[1], "_vs_", c[2], sep = "")})The following table sumarize the number of DEG in each analysis, with the same filter as before : adjusted pvalue < 0.01 and abs(l2fc) > 1.
AllCondRes_sig = sign(AllCondRes,log2FCUP,log2FCDOWN,alpha)
names(AllCondRes_sig) = names(AllCondRes)
res = AllCondRes_sig
UP = sapply(X = names(res),function(name, log2FCUP,alpha, res){
UP = nrow(res[[name]][res[[name]]$log2FoldChange > log2FCUP & res[[name]]$padj < alpha & !is.na(res[[name]]$padj),])
}, log2FCUP, alpha, res)
DOWN = sapply(X = names(res),function(name, log2FCDOWN,alpha, res){
DOWN = nrow(res[[name]][res[[name]]$log2FoldChange < log2FCDOWN & res[[name]]$padj < alpha & !is.na(res[[name]]$padj),])
}, log2FCDOWN,alpha, res)
nDEG = data.frame(DE = sapply(AllCondRes_sig, nrow),
DOWN = DOWN,
UP = UP,
row.names = names(AllCondRes_sig))
nDEGWhen comparing the mutant cells to the wt cells, we can see that the analysis at P8 has more DEG then the one at P2. Also, the analysis between the mutant cells at different passages has more DEG then the one with the wt at different passage.
6.1 Visualisation of the DEG in each analysis
The next tables contains the detailed results of the DEG for each analysis.
test = names(AllCondRes_sig)
DEG_table(AllCondRes_sig[[1]],test[1], SYMBOL_ID, ENS_ID)DEG_table(AllCondRes_sig[[2]],test[2], SYMBOL_ID, ENS_ID)DEG_table(AllCondRes_sig[[3]],test[3], SYMBOL_ID, ENS_ID)DEG_table(AllCondRes_sig[[4]],test[4], SYMBOL_ID, ENS_ID)The next plot shows the repartition of the DEG : which are differentially expressed in only one condition and which are differentially expressed in several conditions. We can see that there is 489 DEG common to the analysis of the mutant vs wt at P8 and P12. Showing that those differentially expressed genes must be due to the genotype. While 538 are only in the mutant vs wt at P8, and 253 at P12. It seems that the increase in passage induces a reduction in the number of DEG.
all_deg = lapply (AllCondRes_sig,rownames)
all_deg = as.data.frame(as.matrix(list_to_matrix(all_deg)))
upset(all_deg,colnames(all_deg),sort_intersections_by=c('degree','cardinality'))6.2 Gene enrichement anaylis
DEG_P8 = all_deg[all_deg$MutantP8_vs_WTP8 == 1 &
all_deg$MutantP12_vs_WTP12 == 0 &
all_deg$WTP12_vs_WTP8 == 0 &
all_deg$MutantP12_vs_MutantP8 == 0,]
DEG_P12 = all_deg[all_deg$MutantP8_vs_WTP8 == 0 &
all_deg$MutantP12_vs_WTP12 == 1 &
all_deg$WTP12_vs_WTP8 == 0 &
all_deg$MutantP12_vs_MutantP8 == 0,]
DEG_P8P12 = all_deg[all_deg$MutantP8_vs_WTP8 == 1 &
all_deg$MutantP12_vs_WTP12 == 1 &
all_deg$WTP12_vs_WTP8 == 0 &
all_deg$MutantP12_vs_MutantP8== 0,]
DEG_mutant = all_deg[all_deg$MutantP8_vs_WTP8 == 0 &
all_deg$MutantP12_vs_WTP12 == 0 &
all_deg$WTP12_vs_WTP8 == 0 &
all_deg$MutantP12_vs_MutantP8 == 1,]
DEG_wt = all_deg[all_deg$MutantP8_vs_WTP8 == 0 &
all_deg$MutantP12_vs_WTP12 == 0 &
all_deg$WTP12_vs_WTP8 == 1 &
all_deg$MutantP12_vs_MutantP8 == 0,]
DEG_wtmut = all_deg[all_deg$MutantP8_vs_WTP8 == 0 &
all_deg$MutantP12_vs_WTP12 == 0 &
all_deg$WTP12_vs_WTP8 == 1 &
all_deg$MutantP12_vs_MutantP8 == 1,]
DEG_list = list(DEG_mutant,DEG_wt,DEG_wtmut,DEG_P12,DEG_P8,DEG_P8P12)
names(DEG_list) = c("DEG_mutant","DEG_wt","DEG_wtmut","DEG_P12","DEG_P8","DEG_P8P12")
wt = AllCondRes_sig$WTP8_vs_WTP12[rownames(DEG_list$DEG_wt),]
mut = AllCondRes_sig$MutantP8_vs_MutantP12[rownames(DEG_list$DEG_mutant),]
wtmut = AllCondRes_sig$WTP12_vs_WTP8[rownames(DEG_list$DEG_wtmut),]
mutwt = AllCondRes_sig$MutantP12_vs_MutantP8[rownames(DEG_list$DEG_wtmut),]
P8 = AllCondRes_sig$MutantP8_vs_WTP8[rownames(DEG_list$DEG_P8),]
P12 = AllCondRes_sig$MutantP12_vs_WTP12[rownames(DEG_list$DEG_P12),]
P8P12 = AllCondRes_sig$MutantP8_vs_WTP8[rownames(DEG_list$DEG_P8P12),]
P12P8 = AllCondRes_sig$MutantP12_vs_WTP12[rownames(DEG_list$DEG_P8P12),]
P8P12_up = AllCondRes_sig$MutantP8_vs_WTP8[rownames(DEG_list$DEG_P8P12),]
P8P12_up = P8P12_up[P8P12_up$log2FoldChange > 1 ,]
P8P12_down = AllCondRes_sig$MutantP8_vs_WTP8[rownames(DEG_list$DEG_P8P12),]
P8P12_down = P8P12_down[P8P12_down$log2FoldChange < -1,]
DEG = list(wt,mut,wtmut,P8,P12,P8P12,P8P12_up,P8P12_down)
names(DEG) = c("wt P12 vs P8","mutant P12 vs P8","mutant and wt at P12 vs P8","mutant vs wt at P8","mutant vs wt at P12","mutant vs wt at P12 and P8","mutant vs wt at P12 and P8 (up)", "mutant vs wt at P12 and P8 (down)")
for (i in 1:length(DEG)){
write.csv(DEG[i],
file = paste(dir,names(DEG[i]),".csv", sep = ""),
sep = ";",
row.names = TRUE, col.names = TRUE)}Next are the results of the GO enrichment analysis performed on the DEGs obtained in each analysis.
#-------------------------- GO analysis all DEG --------------------------#
dirGO = dir.create(file.path(dir, "GO Enrichment"))
dirGO = paste(dir,"GO Enrichment/", sep = "")
for (t in test){
sigGenes = rownames(AllCondRes_sig[[t]])
n_deg = length(sigGenes)
enrich = gost(sigGenes, organism = "mmusculus",sources = "GO",correction_method = "fdr")
term_id = enrich$result$term_id[order(enrich$result$p_value)]
write.table(term_id, file = paste(dirGO, t,"_id.txt", sep = ""), sep = "\t",
row.names = FALSE, col.names = FALSE)
}In terms of biological functions, the enriched terms obtained from de DEG of mutant vs wt at each passages are related to similar functions then the ones obtained in the previous analysis (without considering the passages).
We wanted to see what where the enriched GO terms when considering DEG common to both analyse at the different passage, and those specific to each conditions.
#-------------------------- GO analysis per Usetplot group --------------------------#
# DEG_names = c("wt P12 vs P8","mutant P12 vs P8","mutant and wt at P12 vs P8","mutant vs wt at P8","mutant vs wt at P12","mutant vs wt at P12 and P8","mutant vs wt at P12 and P8 (up)", "mutant vs wt at P12 and P8 (down)")
#
# for (d in DEG_names){
# sigGenes = rownames(DEG[[d]])
# n_deg = length(sigGenes)
# enrich = gost(sigGenes, organism = "mmusculus",sources = "GO",correction_method = "fdr")
# if (is.null(nrow(enrich$result)) == FALSE){
# term_id = enrich$result$term_id[order(enrich$result$p_value)]
# write.table(term_id, file = paste(dirGO, d,"_id.txt", sep = ""), sep = "\t",
# row.names = FALSE, col.names = FALSE)
# }
# }